Goto

Collaborating Authors

 open science


FLIP Reasoning Challenge

Plesner, Andreas, Kuzhagaliyev, Turlan, Wattenhofer, Roger

arXiv.org Artificial Intelligence

Over the past years, advances in artificial intelligence (AI) have demonstrated how AI can solve many perception and generation tasks, such as image classification and text writing, yet reasoning remains a challenge. This paper introduces the FLIP dataset, a benchmark for evaluating AI reasoning capabilities based on human verification tasks on the Idena blockchain. FLIP challenges present users with two orderings of 4 images, requiring them to identify the logically coherent one. By emphasizing sequential reasoning, visual storytelling, and common sense, FLIP provides a unique testbed for multimodal AI systems. Our experiments evaluate state-of-the-art models, leveraging both vision-language models (VLMs) and large language models (LLMs). Results reveal that even the best open-sourced and closed-sourced models achieve maximum accuracies of 75.5% and 77.9%, respectively, in zero-shot settings, compared to human performance of 95.3%. Captioning models aid reasoning models by providing text descriptions of images, yielding better results than when using the raw images directly, 69.6% vs. 75.2% for Gemini 1.5 Pro. Combining the predictions from 15 models in an ensemble increases the accuracy to 85.2%. These findings highlight the limitations of existing reasoning models and the need for robust multimodal benchmarks like FLIP. The full codebase and dataset will be available at https://github.com/aplesner/FLIP-Reasoning-Challenge.


Open Science and Artificial Intelligence for supporting the sustainability of the SRC Network: The espSRC case

Garrido, J., Sánchez-Expósito, S., Ruiz-Falcó, A., Ruedas, J., Mendoza, M. Á., Vázquez, V., Parra, M., Sánchez, J., Labadie, I., Darriba, L., Moldón, J., Rodriguez-Álvarez, M., Díaz, J., Verdes-Montenegro, L.

arXiv.org Artificial Intelligence

The SKA Observatory (SKAO), a landmark project in radio astronomy, seeks to address fundamental questions in astronomy. To process its immense data output, approximately 700 PB/year, a global network of SKA Regional Centres (SR-CNet) will provide the infrastructure, tools, computational power needed for scientific analysis and scientific support. The Spanish SRC (espSRC) focuses on ensuring the sustainability of this network by reducing its environmental impact, integrating green practices into data platforms, and developing Open Science technologies to enable reproducible research. This paper discusses and summarizes part of the research and development activities that the team is conducting to reduce the SRC energy consumption at the espSRC and SRCNet. The paper also discusses fundamental research on trusted repositories to support Open Science practices.


Typhoon T1: An Open Thai Reasoning Model

Taveekitworachai, Pittawat, Manakul, Potsawee, Tharnpipitchai, Kasima, Pipatanakul, Kunat

arXiv.org Artificial Intelligence

This paper introduces Typhoon T1, an open effort to develop an open Thai reasoning model. A reasoning model is a relatively new type of generative model built on top of large language models (LLMs). A reasoning model generates a long chain of thought before arriving at a final answer, an approach found to improve performance on complex tasks. However, details on developing such a model are limited, especially for reasoning models that can generate traces in a low-resource language. Typhoon T1 presents an open effort that dives into the details of developing a reasoning model in a more cost-effective way by leveraging supervised fine-tuning using open datasets, instead of reinforcement learning. This paper shares the details about synthetic data generation and training, as well as our dataset and model weights. Additionally, we provide insights gained from developing a reasoning model that generalizes across domains and is capable of generating reasoning traces in a low-resource language, using Thai as an example. We hope this open effort provides a foundation for further research in this field.


AI for Open Science: A Multi-Agent Perspective for Ethically Translating Data to Knowledge

Yakaboski, Chase, Hyde, Gregory, Nyanhongo, Clement, Santos, Eugene Jr

arXiv.org Artificial Intelligence

AI for Science (AI4Science), particularly in the form of self-driving labs, has the potential to sideline human involvement and hinder scientific discovery within the broader community. While prior research has focused on ensuring the responsible deployment of AI applications, enhancing security, and ensuring interpretability, we also propose that promoting openness in AI4Science discoveries should be carefully considered. In this paper, we introduce the concept of AI for Open Science (AI4OS) as a multi-agent extension of AI4Science with the core principle of maximizing open knowledge translation throughout the scientific enterprise rather than a single organizational unit. We use the established principles of Knowledge Discovery and Data Mining (KDD) to formalize a language around AI4OS. We then discuss three principle stages of knowledge translation embedded in AI4Science systems and detail specific points where openness can be applied to yield an AI4OS alternative. Lastly, we formulate a theoretical metric to assess AI4OS with a supporting ethical argument highlighting its importance. Our goal is that by drawing attention to AI4OS we can ensure the natural consequence of AI4Science (e.g., self-driving labs) is a benefit not only for its developers but for society as a whole.


The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing Practices

Cao, Hancheng, Dodge, Jesse, Lo, Kyle, McFarland, Daniel A., Wang, Lucy Lu

arXiv.org Artificial Intelligence

In recent years, funding agencies and journals increasingly advocate for open science practices (e.g. data and method sharing) to improve the transparency, access, and reproducibility of science. However, quantifying these practices at scale has proven difficult. In this work, we leverage a large-scale dataset of 1.1M papers from arXiv that are representative of the fields of physics, math, and computer science to analyze the adoption of data and method link-sharing practices over time and their impact on article reception. To identify links to data and methods, we train a neural text classification model to automatically classify URL types based on contextual mentions in papers. We find evidence that the practice of link-sharing to methods and data is spreading as more papers include such URLs over time. Reproducibility efforts may also be spreading because the same links are being increasingly reused across papers (especially in computer science); and these links are increasingly concentrated within fewer web domains (e.g. Github) over time. Lastly, articles that share data and method links receive increased recognition in terms of citation count, with a stronger effect when the shared links are active (rather than defunct). Together, these findings demonstrate the increased spread and perceived value of data and method sharing practices in open science.


Investigating Reproducibility at Interspeech Conferences: A Longitudinal and Comparative Perspective

Arvan, Mohammad, Doğruöz, A. Seza, Parde, Natalie

arXiv.org Artificial Intelligence

Reproducibility is a key aspect for scientific advancement across disciplines, and reducing barriers for open science is a focus area for the theme of Interspeech 2023. Availability of source code is one of the indicators that facilitates reproducibility. However, less is known about the rates of reproducibility at Interspeech conferences in comparison to other conferences in the field. In order to fill this gap, we have surveyed 27,717 papers at seven conferences across speech and language processing disciplines. We find that despite having a close number of accepted papers to the other conferences, Interspeech has up to 40% less source code availability. In addition to reporting the difficulties we have encountered during our research, we also provide recommendations and possible directions to increase reproducibility for further studies.


Why diversity and inclusion needs to be at the forefront of future AI

Robohub

Inês Hipólito is a highly accomplished researcher, recognized for her work in esteemed journals and contributions as a co-editor. She has received research awards including the prestigious Talent Grant from the University of Amsterdam in 2021. After her PhD, she held positions at the Berlin School of Mind and Brain and Humboldt-Universität zu Berlin. Currently, she is a permanent lecturer of the philosophy of AI at Macquarie University, focusing on cognitive development and the interplay between augmented cognition (AI) and the sociocultural environment. Neurourbanism as a Novel Approach in Global Health,' funded by the Berlin University Alliance.


EleutherAI: Going Beyond "Open Science" to "Science in the Open"

Phang, Jason, Bradley, Herbie, Gao, Leo, Castricato, Louis, Biderman, Stella

arXiv.org Artificial Intelligence

Over the past two years, EleutherAI has established itself as a radically novel initiative aimed at both promoting open-source research and conducting research in a transparent, openly accessible and collaborative manner. EleutherAI's approach to research goes beyond transparency: by doing research entirely in public, anyone in the world can observe and contribute at every stage. Our work has been received positively and has resulted in several high-impact projects in Natural Language Processing and other fields. In this paper, we describe our experience doing public-facing machine learning research, the benefits we believe this approach brings, and the pitfalls we have encountered.


naab: A ready-to-use plug-and-play corpus for Farsi

Sabouri, Sadra, Rahmati, Elnaz, Gooran, Soroush, Sameti, Hossein

arXiv.org Artificial Intelligence

Huge corpora of textual data are always known to be a crucial need for training deep models such as transformer-based ones. This issue is emerging more in lower resource languages - like Farsi. We propose naab, the biggest cleaned and ready-to-use open-source textual corpus in Farsi. It contains about 130GB of data, 250 million paragraphs, and 15 billion words. The project name is derived from the Farsi word NAAB K which means pure and high grade. We also provide the raw version of the corpus called naab-raw and an easy-to-use preprocessor that can be employed by those who wanted to make a customized corpus.


BLOOM Is the Most Important AI Model of the Decade

#artificialintelligence

You may be wondering if such a bold headline is true. GPT-3 came out in 2020 and established a new road the whole AI industry has been following in intention and attention since. Tech companies have repeatedly built better, larger models, one after another. But although they've put millions into the task, none of them has fundamentally changed the leading paradigm or the game's rules GPT-3 laid out two years ago. Gopher, Chinchilla, and PaLM (arguably the current podium of large language models) are significantly better than GPT-3 but they are, in essence, more of the same thing.